Search CORE

11 research outputs found

Enriching for correct prediction of biological processes using a combination of diverse classifiers

Author: A Lagreid
A Mateos
BD Ripley
Brad Windle
C Cortes
CC Chang
CC Chang
CW Hsu
D Ko
D Wolpert
Daijin Ko
DT Ross
DW Huang
H Lan
K Ting
L Breiman
L Breiman
L Kuncheva
MB Eisen
MP Brown
P Coyle
S Dzeroski
T Hastie
T Robertson
V Vapnik
VR Iyer
W Zhang
WN Venables
Y Guan
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Machine learning models (classifiers) for classifying genes to biological processes each have their own unique characteristics in what genes can be classified and to what biological processes. No single learning model is qualitatively superior to any other model and overall precision for each model tends to be low. The classification results for each classifier can be complementary and synergistic suggesting the benefit of a combination of algorithms, but often the prediction probability outputs of various learning models are neither comparable nor compatible for combining. A means to compare outputs regardless of the model and data used and combine the results into an improved comprehensive model is needed. Results Gene expression patterns from NCI's panel of 60 cell lines were used to train a Random Forest, a Support Vector Machine and a Neural Network model, plus two over-sampled models for classifying genes to biological processes. Each model produced unique characteristics in the classification results. We introduce the Precision Index measure (PIN) from the maximum posterior probability that allows assessing, comparing and combining multiple classifiers. The class specific precision measure (PIC) is introduced and used to select a subset of predictions across all classes and all classifiers with high precision. We developed a single classifier that combines the PINs from these five models in prediction and found that the PIN Combined Classifier (PINCom) significantly increased the number of correctly predicted genes over any single classifier. The PINCom applied to test genes that were not used in training also showed substantial improvement over any single model. Conclusions This paper introduces novel and effective ways of assessing predictions by their precision and recall plus a method that combines several machine learning models and capitalizes on synergy and complementation in class selection, resulting in higher precision and recall. Different machine learning models yielded incongruent results each of which were successfully combined into one superior model using the PIN measure we developed. Validation of the boosted predictions for gene functions showed the genes to be accurately predicted.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

VCU Scholars Compass

Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration

Author: A Drawid
A Lagreid
A Tanay
AC Gavin
AJ Enright
B Schwikowski
CJ Roberts
EM Marcotte
EM Marcotte
GD Bader
HJ Bussemaker
HW Mewes
I Cherel
J Ihmels
Jianghui Xiong
Kunyi Luo
LF Wu
M Ashburner
M Deng
M Deng
M Pellegrini
MB Eisen
MC von
MP Brown
OG Troyanskaya
P Jorgensen
P Uetz
PT Spellman
R Kohavi
R Overbeek
SF Altschul
Shanguang Chen
Simon Rayner
T Ito
TR Hazbun
TR Hughes
U Karaoz
WK Huh
WR Pearson
X Zhou
Y Chen
Y Ho
Yinghui Li
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The automation of many common molecular biology techniques has resulted in the accumulation of vast quantities of experimental data. One of the major challenges now facing researchers is how to process this data to yield useful information about a biological system (e.g. knowledge of genes and their products, and the biological roles of proteins, their molecular functions, localizations and interaction networks). We present a technique called Global Mapping of Unknown Proteins (GMUP) which uses the Gene Ontology Index to relate diverse sources of experimental data by creation of an abstraction layer of evidence data. This abstraction layer is used as input to a neural network which, once trained, can be used to predict function from the evidence data of unannotated proteins. The method allows us to include almost any experimental data set related to protein function, which incorporates the Gene Ontology, to our evidence data in order to seek relationships between the different sets. RESULTS: We have demonstrated the capabilities of this method in two ways. We first collected various experimental datasets associated with yeast (Saccharomyces cerevisiae) and applied the technique to a set of previously annotated open reading frames (ORFs). These ORFs were divided into training and test sets and were used to examine the accuracy of the predictions made by our method. Then we applied GMUP to previously un-annotated ORFs and made 1980, 836 and 1969 predictions corresponding to the GO Biological Process, Molecular Function and Cellular Component sub-categories respectively. We found that GMUP was particularly successful at predicting ORFs with functions associated with the ribonucleoprotein complex, protein metabolism and transportation. CONCLUSION: This study presents a global and generic gene knowledge discovery approach based on evidence integration of various genome-scale data. It can be used to provide insight as to how certain biological processes are implemented by interaction and coordination of proteins, which may serve as a guide for future analysis. New data can be readily incorporated as it becomes available to provide more reliable predictions or further insights into processes and interactions

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central